AITopics

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.78)

arXiv.org Artificial IntelligenceNov-3-2025

Probability Distributions Computed by Hard-Attention Transformers

Yang, Andy, Svete, Anej, Li, Jiaoda, Lin, Anthony Widjaja, Rawski, Jonathan, Cotterell, Ryan, Chiang, David

Most expressivity results for transformers treat them as language recognizers (which accept or reject strings), and not as they are used in practice, as language models (which generate strings autoregressively and probabilistically). Here, we characterize the probability distributions that transformer language models can express. We show that making transformer language recognizers autoregressive can sometimes increase their expressivity, and that making them probabilistic can break equivalences that hold in the non-probabilistic case. Our overall contribution is to tease apart what functions transformers are capable of expressing, in their most common use-case as language models.

artificial intelligence, autoregressor, natural language, (18 more...)

2510.27118

Country: North America > United States > California (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)

Neural Information Processing SystemsOct-9-2025, 18:36:52 GMT

0e797d5139ad94fc2dc2080c09119f29-Paper-Conference.pdf

language model, neighbor, sequence, (16 more...)

Country:

Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)
Europe > Switzerland > Zürich > Zürich (0.04)
Asia > Middle East > Jordan (0.04)
(3 more...)

Genre: Research Report > Experimental Study (1.00)

Industry: Information Technology (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(2 more...)

Gidatkar, Taniya, Ajao, Oluwaseun, Shardlow, Matthew

Differential Robustness in Transformer Language Models: Empirical Evaluation Under Adversarial Text Attacks

arXiv.org Artificial IntelligenceSep-15-2025

This study evaluates the resilience of large language models (LLMs) against adversarial attacks, specifically focusing on Flan-T5, BERT, and RoBERTa-Base. Using systematically designed adversarial tests through TextFooler and BERTAttack, we found significant variations in model robustness. RoBERTa-Base and FlanT5 demonstrated remarkable resilience, maintaining accuracy even when subjected to sophisticated attacks, with attack success rates of 0%. In contrast. BERT-Base showed considerable vulnerability, with TextFooler achieving a 93.75% success rate in reducing model accuracy from 48% to just 3%. Our research reveals that while certain LLMs have developed effective defensive mechanisms, these safeguards often require substantial computational resources. This study contributes to the understanding of LLM security by identifying existing strengths and weaknesses in current safeguarding approaches and proposes practical recommendations for developing more efficient and effective defensive strategies.

large language model, machine learning, natural language, (21 more...)

2509.09706

Genre: Research Report > New Finding (0.47)

Industry: Information Technology > Security & Privacy (0.91)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Neural Information Processing SystemsMay-27-2025, 04:59:16 GMT

Talking Heads: Understanding Inter-Layer Communication in Transformer Language Models

Although it is known that transformer language models (LMs) pass features from early layers to later layers, it is not well understood how this information is represented and routed by the model. We analyze a mechanism used in two LMs to selectively inhibit items in a context in one task, and find that it underlies a commonly used abstraction across many context-retrieval behaviors. Specifically, we find that models write into low-rank subspaces of the residual stream to represent features which are then read out by later layers, forming low-rank communication channels (Elhage et al., 2021) between layers. A particular 3D subspace in model activations in GPT-2 can be traversed to positionally index items in lists, and we show that this mechanism can explain an otherwise arbitrary-seeming sensitivity of the model to the order of items in the prompt. That is, the model has trouble copying the correct information from context when many items crowd" this limited space.

inter-layer communication, mechanism, transformer language model, (2 more...)

Technology: Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.73)

Neural Information Processing SystemsMay-26-2025, 16:13:33 GMT

Limits of Transformer Language Models on Learning to Compose Algorithms

artificial intelligence, natural language, transformer language model, (2 more...)

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

Neural Information Processing SystemsFeb-8-2025, 15:31:37 GMT

Review for NeurIPS paper: Measuring Systematic Generalization in Neural Proof Generation with Transformers

Summary and Contributions: This paper evaluates how well Transformer language models can generate natural language expressions corresponding to first-order logical proofs, and their answers. Given a dataset of facts (tuples like entity1-relation1-entity2, entity2-relation2-entity3) and a query (entity1-?-entity3), the language model is trained on a sentence representing the facts, the query, a proof, and the answer. The proof is a chain of implications (for example, one step is "since entity1 is in relation1 with entity2 and entity2 is in relation2 with entity3, then entity1 is in relation2 with entity3"). The answer is the missing relation, such as relation2. The model can then be tested by presenting only the prefix of the expressions corresponding to the facts and the query (and perhaps the proof), and predicting the answer. The paper evaluates the ability of Transformer language models to generalize in several settings, determined by the number of relations.

language model, measuring systematic generalization, transformer language model, (6 more...)

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

arXiv.org Artificial IntelligenceOct-29-2024

MIMIC-IV-Ext-PE: Using a large language model to predict pulmonary embolism phenotype in the MIMIC-IV dataset

Lam, B. D., Ma, S., Kovalenko, I., Wang, P., Jafari, O., Li, A., Horng, S.

Pulmonary embolism (PE) is a leading cause of preventable in-hospital mortality. Advances in diagnosis, risk stratification, and prevention can improve outcomes. There are few large publicly available datasets that contain PE labels for research. Using the MIMIC-IV database, we extracted all available radiology reports of computed tomography pulmonary angiography (CTPA) scans and two physicians manually labeled the results as PE positive (acute PE) or PE negative. We then applied a previously finetuned Bio_ClinicalBERT transformer language model, VTE-BERT, to extract labels automatically. We verified VTE-BERT's reliability by measuring its performance against manual adjudication. We also compared the performance of VTE-BERT to diagnosis codes. We found that VTE-BERT has a sensitivity of 92.4% and positive predictive value (PPV) of 87.8% on all 19,942 patients with CTPA radiology reports from the emergency room and/or hospital admission. In contrast, diagnosis codes have a sensitivity of 95.4% and PPV of 83.8% on the subset of 11,990 hospitalized patients with discharge diagnosis codes. We successfully add nearly 20,000 labels to CTPAs in a publicly available dataset and demonstrate the external validity of a semi-supervised language model in accelerating hematologic research.

large language model, machine learning, natural language, (19 more...)

2411.00044

Country: North America > United States > Massachusetts (0.29)

Genre: Research Report > Experimental Study (0.67)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Health Care Providers & Services (1.00)
Health & Medicine > Diagnostic Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.55)

Wei, Xiuying, Moalla, Skander, Pascanu, Razvan, Gulcehre, Caglar

Investigating Low-Rank Training in Transformer Language Models: Efficiency and Scaling Analysis

arXiv.org Artificial IntelligenceJul-13-2024

State-of-the-art LLMs often rely on scale with high computational costs, which has sparked a research agenda to reduce parameter counts and costs without significantly impacting performance. Our study focuses on Transformer-based LLMs, specifically applying low-rank parametrization to the computationally intensive feedforward networks (FFNs), which are less studied than attention blocks. In contrast to previous works, (i) we explore low-rank parametrization at scale, up to 1.3B parameters; (ii) within Transformer language models rather than convolutional architectures; and (iii) starting from training from scratch. Experiments on the large RefinedWeb dataset show that low-rank parametrization is both efficient (e.g., 2.6$\times$ FFN speed-up with 32\% parameters) and effective during training. Interestingly, these structured FFNs exhibit steeper scaling curves than the original models. Motivated by this finding, we develop the wide and structured networks surpassing the current medium-sized and large-sized Transformer in perplexity and throughput performance.

arxiv preprint arxiv, language model, low-rank parametrization, (8 more...)

2407.09835

Country:

Asia > Middle East > Jordan (0.05)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Thomm, Jonathan, Terzic, Aleksandar, Karunaratne, Geethan, Camposampiero, Giacomo, Schölkopf, Bernhard, Rahimi, Abbas

Limits of Transformer Language Models on Algorithmic Learning

arXiv.org Artificial IntelligenceFeb-8-2024

We analyze the capabilities of Transformer language models on learning discrete algorithms. To this end, we introduce two new tasks demanding the composition of several discrete sub-tasks. On both training LLaMA models from scratch and prompting on GPT-4 and Gemini we measure learning compositions of learned primitives. We observe that the compositional capabilities of state-of-the-art Transformer language models are very limited and sample-wise scale worse than relearning all sub-tasks for a new algorithmic composition. We also present a theorem in complexity theory, showing that gradient descent on memorizing feedforward models can be exponentially data inefficient.

neighbour, sequence, transformer language model, (13 more...)

2402.05785

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
(4 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)